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Abstract 



We propose a method for setting limits that avoids excluding parameter values for 
which the sensitivity falls below a specified threshold. These "power-constrained" limits 
(PCL) address the issue that motivated the widely used CLg procedure [I], but do so in 
a way that makes more transparent the properties of the statistical test to which each 
value of the parameter is subjected. A case of particular interest is for upper limits on 
parameters that are proportional to the cross section of a process whose existence is not 
yet established. The basic idea of the power constraint can easily be applied, however, to 
other types of limits. 



1 Introduction 



In particle physics experiments one often tests specific models that predict new phenomena. 
Some regions of a model's parameter space may be rejected by these tests; in other regions 
the model may be deemed compatible with the data. This is often done in the framework 
of a frequentist statistical test, which is inverted to determine a confidence interval. This 
formalism is reviewed in Sec. [21 

It is generally the case that for some parameter values of a signal model, the magnitude of 
the predicted effect with respect to the background-only model is extremely small. That is, 
one has effectively no experimental sensitivity to those parts of the model's parameter space. 
Nevertheless, procedures based on frequentist tests may exclude these values. We discuss 
how this can occur and how it has been dealt with in the past in Sections O and [H 

In Sec. [5] we introduce a new method for constraining confidence intervals in a way that 
prevents one from excluding parameter values to which one does not have sufficient sensitivity. 
As the measure of sensitivity is based on the power of a statistical test, we refer to the bounds 
established by these modified intervals as power-constrained limits (PCL). 

Section [6] illustrates the procedure for the case of an upper limit derived from a Gaussian 
measurement. Section [8] discusses how the procedure can be applied in cases where there are 
additional nuisance parameters, beyond the parameters of interest, that must be fitted using 
the data. A summary and conclusions are given in Sec. [9j 

2 Confidence intervals from inverting a statistical test 

In this section we review the formalism of inverting a frequentist statistical test to obtain a 
confidence interval. A more thorough treatment can be found in many texts, such as Ref. [2]. 

We consider a test for a parameter which here represents the signal strength (or any 
parameter proportional to the rate) of a certain signal process. A test of a given fj, is carried 
out by specifying a region of data outcomes called the critical region, which are disfavoured, 
in a sense discussed below, under assumption of fi. The data outcome could be, for example, 
the number of events observed in a given region of phase space, or it could represent a larger 
set of numerical values. Here we will use x to represent the data, and to denote the 
critical region. 

The critical region is chosen to such that the probability to observe the data in it, under 
assumption of the hypothesized ^, is not greater than a given constant a, called the size or 
significance level of the test, i.e.. 



Often by convention a = 0.05 is used. If the data are observed in the critical region, the 
hypothesis fi is rejected. It is necessary in general to specify Eq. ([TJ as an inequality because 
the data may be discrete (e.g., an integer number of events), and so there may not exist a 
subset of the possible data values for which the summed probability is exactly equal to a. 

It is convenient to construct from the data a test statistic q^, such that greater reflects 
an increasing level of incompatibility between the data and the hypothesized parameter value 
/i. In this way the boundary of the critical region in data space is given by a surface of constant 
Qfj,, with the critical region containing the data that give the greatest values of g^. Once such 
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a function has been defined, one can for any observed value g^,obs compute a p- value, i.e., the 
probability under assumption of fj, to find data with equal or greater incompatibility with fi, 



oo 



(2) 



Qfi,ohs 



where f{q^\fJ-) represents the probability density function (pdf) of (7^ assuming a data distri- 
bution with strength parameter fi. Thus the test can be equivalently formulated by rejecting 
fj, if its p- value is found less than a. 

A test of size a can be carried out for all values of fj,. The set of values not rejected 
constitutes a confidence interval for fj, with confidence level 1 — a. This interval will by 
construction include the true value of the parameter with a probability of at least 1 — a. 

The procedure described above for constructing a confidence interval by inverting a test 
is not unique, however, because there are (often infinitely) many different subsets of the data 
space that could be chosen for the test's critical region w^. This is usually selected such that 
the probability to find x E Wf^ is large if a given alternative hypothesis (or set of alternatives) 
is true. The power of the test with respect to an alternative value of the parameter which 
we denote here as M^/(/i), is 



If the test of fi is formulated using a p- value, such that finding < a is equivalent to finding 
X G w^, then the power can be written equivalently as 



Often the power with respect to certain alternatives is used as the criterion according to 
which one chooses the critical region of a test. Confidence intervals obtained from inverting 
the test thus depend on this choice. For the present discussion, however, we will assume that 
the test has been defined, and the power will be used only to modify the resulting confidence 
interval so that it does not exclude parameter values to which one does not have sufficient 
sensitivity. This concept is defined more quantitatively in the following section. 

3 Spurious exclusion 

When testing a hypothesized strength parameter /i, it may be that the magnitude of the 
signal implied by /x is extremely small — so small, that the probabilities for the data are very 
close to what they would be in the absence of the signal process, i.e., /x = 0. In such a case 
one has little or no sensitivity to the given value of /i. 

For example, Fig. [T] illustrates a situation where there is only a very small level of sen- 
sitivity to a given strength parameter fi. The plot shows the pdfs of the statistic (7^ under 
assumption of strength parameters and also assuming ^ = 0, i.e., /((?/i|/i) and /(g^|0). 
If the observed value of the statistic is found in the critical region corresponding to the top 
5% of f{q^\n), then the hypothesized fi is rejected. But as the two pdfs almost coincide, the 
probability to reject n if the true strength parameter is zero is also close to a = 0.05. 

Figure [2b) shows the same distributions as (a) but for a different value of fi. The size of 
the test is, as in (a), equal to a. Here, however, the distribution of q/^i under the assumption 



(3) 



M^,{fi) = P{Pf,<a\fi') . 



(4) 
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Figure 1: Illustration of statistical tests of parameter values ^ for the cases of (a) little sensitivity 
and (b) substantial sensitivity (see text). 



of /i' = leads to a substantially greater probability to reject n, i.e., to find in the critical 
region. 

The sensitivity of a test of /i can be quantified using the power of the test with respect 
to a stated alternative /j,', which we will take here to be the no-signal hypothesis, fi' = 0. In 
the case where the pdfs f{q^\^) and /((?/x|0) coincide, the probability to reject // assuming 
the alternative fx' = approaches the significance level of the test, a. 

In the context of a search for a new phenomenon, this means that with probability not 
less than a one will exclude hypotheses to which one has little or no sensitivity, which we 
refer to here as spurious exclusion. The hypothesis might indeed be false, but if it is excluded, 
this is more naturally interpreted as a data fluctuation away from the region favoured under 
assumption of /j,. This could result, for example, in a search for a hypothetical particle with 
a mass far above the range where it would have a noticeable impact on the data. Particle 
Physics experiments often carry out many searches covering a broad parameter range for 
many signal models, and so spurious exclusion is in fact a problem that can arise often. 



4 Previous methods that address spurious exclusion 

The problem of spurious exclusion, or equivalently, having a "lucky" statistical fluctuation 
lead to an anomalously strong limit, has been known in the particle physics community for 
many years. The note by Highland [3] reviews the problem and proposes several possible 
solutions; further discussion can be found in the review on statistics by the Particle Data 
Group [4J. 

The problem received particular focus during searches for the Higgs Boson at the LEP 
Collider in the 1990s, and led to a procedure called "CLg" [I]. Here one forms the ratio 

CL, = , (5) 

1 -Po 

where and po are the p- values of the hypothesized strength parameter values /i and 0, 
respectively. In the CLg procedure, fj, is deemed to be excluded if one finds CL^ < a. 
Because CL<j is aways greater than p^, the probability of exclusion assuming is necessarily 
less than a. Thus the quoted upper limit from the CLg procedure will be greater than the 
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upper limit according to the method of Sec. [21 and in this sense the CLg procedure is said to 
be conservative. This is illustrated in the example described in Sec. [6l 

Because of this conservatism, the frequentist coverage probability of the CLg upper limits 
(i.e., the probability under assumption of n that the interval will contain //) is not equal to 
a, but is in general larger. Although the exact coverage probabilities of CLg intervals can be 
found as a function of fi, this is not often reported. 

5 Power Constrained Limits 

Here we propose an alternate procedure for producing intervals whose coverage properties are 
easily apparent for all values of /U. To do this we break the range of fi to be tested into two 
categories based on the power Mq{^) of a test of jj, with respect to the no-signal alternative, 
^' = 0. If this power is below a specified threshold Mmin, one's sensitivity to this parameter 
is deemed to be too low and the point is not regarded as testable. If the power is greater 
than or equal to the threshold, then the test of size a is carried out. A value of /i is excluded 
if 

(a) the value /U is rejected by the test, i.e., x G or equivalently < a, and 

(b) one has sufficient sensitivity to fi, i.e., Mo(/u) > Mmin- 

An interval is constructed from the values of fj, not excluded. If this is done on the basis of 
the test (a) only, it is referred to here as an unconstrained interval. Application of the power 
constraint (b) results in the power-constrained interval or limit. 

The coverage probability of the power-constrained interval is 100% for fx values that have 
power below Mmin, and a for those values with power greater than or equal to the threshold. 
When reporting the result it is recommended to indicate which parameter values were above 
and which below the power-constraint threshold, and in this way one can easily see what the 
coverage probability is for all values of //. 

The choice of the minimum power threshold is a matter of convention. We prefer to use 
Mrain = 0.16, or more precisely, Mmin = 'I'l— 1) = 0.1587, where $ is the standard normal 
cumulative distribution (i.e., the cumulative distribution for Gaussian with a mean of zero and 
unit standard deviation). As shown below, this corresponds to applying the power constraint 
if the unconstrained limit fluctuates one standard deviation below its median value under the 
background-only hypothesis. 

This procedure bears some similarity to one introduced recently in the astrophysics com- 
munity in Ref. [5], although there the power refers to a test of the background-only (// = 0) 
hypothesis, and furthermore the result is not used in quite the same way as what we propose 
here. Note also in Ref. "upper bound" is similar to what we call an upper limit, and their 
term "upper limit" is taken to refer to the sensitivity threshold. 

Formally, to construct the interval for fi one begins by finding the power for a test of each 
fi with respect to the alternative = 0, 

Moi^i) = P{x e w^\0) = Pip^ < a\0) . (6) 

In some problems this can be found in closed form; otherwise it can be obtained using a Monte 
Carlo calculation, in which one for every value of fx calculates the distribution of using 
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data generated according to = 0. The value Mo(/u) is then found simply by integrating 
each distribution from zero up to the desired significance level a (e.g., 0.05). 

An equivalent and in ways simpler procedure is first to carry out the statistical test 
without the power constraint, and invert this to find the unconstrained confidence interval 
for /X. Some of the parameter values that are excluded from this interval may be found to have 
a power below the required threshold, and they are then re-includcd in the power-constrained 
interval, which is thus by construction a superset of the unconstrained one. 

For example, one may be interested in finding an an upper limit, //up, i-e., the largest 
value of fx not excluded. By inverting the test, one determines function of the data. 

One can therefore determine the distribution of //up, e.g., by simulating the experiment many 
times under assumption of /i = and constructing a histogram of /iup for each outcome. Then 
for each value of ji one determines the corresponding power. This is the probability, under 
assumption of the background-only (/x = 0) hypothesis, to reject /x, i.e., to find // outside of 
the unconstrained confidence interval. In the case of an upper limit this is 

Mo(/x) = P(/Xup < /x|0) . (7) 

One should note the following caveat: It can be that for certain data outcomes, all values 

of are excluded by the test, in which case fi^^ is not defined. In such cases one must count 
the outcomes as contributing to the probability that ^ is outside the confidence interval. 

With this in mind, one can then find the smallest value of ^ for which the power Mo(/x) 
is at least equal to the minimum value Mmin, denoted here as /Ltmin- The Power-Constrained 
Limit //*p is given by the larger of the unconstrained limit /Xup or the minimum value to which 
one has sensitivity, /Xmin^ 

/X*p = max(/Xup, Mmin) • (8) 



6 PCL for an upper limit based on a Gaussian measurement 

Often the test of /x is based on a Gaussian distributed measurement. For example, for a 
sufficiently large data sample and under conditions often satisfied in practice, the distribution 
of the Maximum Likelihood Estimator jl has a Gaussian form with standard deviation a and 
is centred about the true fi. Here we will assume this is the case and further take a to be 
known. 

For the case of an upper limit, we define the critical region to contain the lowest values 
of jl such that the probability to find jl there is equal to a. For Gaussian distributed jl with 
mean /x and standard deviation o", the critical region is therefore 

/X < /x-a$-^(l -a) , (9) 

where is the inverse of the standard Gaussian cumulative distribution (the standard 
normal quantile). For example, a = 0.05 gives <I>~^(1 — a) = 1.64. 

Rejecting /x if the data are in the critical region gives the unconstrained upper limit, 

A^up = /i + £^$"^1 - a) • (10) 
The power of the test of /x with respect to the alternative /x' = is 
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Mo(/i) = P(^fi<fi- (7$-^! - a)|0) • 
Because fi here fohows a Gaussian distribution, the power can be written 



(11) 



Mo(^) = $(--$~^(l-a) 



a 



(12) 



This is illustrated in Fig. [2] for a = 0.05 and a = 1. Since the cumulative distribution <1> is 
monotonically increasing and furthermore '^(l — a) = —^{a), Eq. (jl2p gives Mo(0) = a and 
Mo()u) > a for all /i > 0, as can be seen in the figure. 




Requiring the power Mo(/i) > 



Figure 2: The power Mo(/i) for a 
test of /i with respect to the alter- 
native /i' = (see text). 



implies that the smallest /i to which one is sensitive is 



(13) 



/^min = ^(^"'(M^i„)+$-l(l-Q)) . (14) 

By combining Eqs. and one sees that //up is below /Xmin if one finds 



/i < (T$ ^(Mmin) • 

Thus one finds the following expression for the power-constrained upper limit: 



a ($-1 (M^in) + (1 - a)) fi<a^-^{M„ 



up 



fl + cr<I>"^(l - a 



otherwise . 



(15) 



(16) 



This is shown function of /x in Fig. [3||a). 

For comparison, Fig. ^a) also shows the upper limit without the power costraint (here 
called "classical") as well as the one obtained from the CLg procedure, which for this particular 
problem coincides with the Bayesian upper limit when using a constant prior for /U > 0. 

Figure [SU^b) shows the corresponding coverage probabilities for the upper limits. For PCL, 
this is 100% for /i < //^in = c7($-^(Mmin) + $"^1 - a)) = 0.64, and 95% otherwise. For CL^ 
and Bayesian, the coverage probability is everywhere greater than 95%, approaching 95% as 
/i increases. 
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Figure 3: (a) Upper limits from the PCL (solid), CLg and Bayesian (dashed), and classical (dotted) 

procedures as a function of /x) , which is assumed to follow a Gaussian distribution with unit standard 
deviation, (b) The corresponding coverage probabilities as a function of /i. 

7 Distribution of upper limit and choice of minimum power 

As mentioned above, we prefer to take the minimum power threshold M^i^ = <&(—!) = 
0.1587. From Eq. (jl5p one can see that if /^up follows a Gaussian distribution, this choice of 
Mmin corresponds to applying the power constraint if the data fluctuate below their expected 
value, under assumption of yu = 0, by more than one standard deviation. Here we will refer to 
a fluctuation at this level as la (downward), regardless of the distribution of ^up- In fact, the 
distribution of often is close to Gaussian so the terminology is natural and convenient. 

This choice of Mmin can be motivated by the idea that a sufficiently small fluctuation 
should not result in spurious exclusion of the type that the PCL and CLg procedures are 
intended to prevent. If, for example, one were to require Afmin = 0.5, then one would impose 
the power constraint whenever the observed limit is found below the median, i.e., half of the 
time, which is not consistent with the notion of accepting small fluctuations. Therefore we 
feel requiring a power of 50% is too extreme. 

On the other hand, for any (unbiased) test, the power is always greater than or equal to 
the signiflcance level a. So if one were to take Mmin ^ ct then the result is the same as the 
unconstrained limit. Since one often takes a = 0.05, taking Mmin = 0.05 would correspond 
to a 1.64(7 downward fluctuation (i.e., $(—1.64) = 0.05). 

Sensitivity to the parameter fi corresponds having a power Mq{^) substantially larger 
than the signiflcance level a. Therefore one would like to take Mmin large with respect to 
a, while still allowing for moderate a downward fluction of the limit before imposing the 
power constraint. We therefore believe = ^I'C"!) ~ 0.16 is a natural choice for use with 

a = 0.05. This allows for fluctuations up to the one-sigma level before imposing the power 
constraint, and the difference between a = 0.05 and Mmin = 0.16 is sufficient to ensure a 
reasonable sensitivity. If one were to take, e.g., a = 0.1, as is done in some analyses, then 
one may consider that a somewhat larger Mmin is appropriate. 

In many searches for new phenomena, one may carry out the analysis for a range of 
parameters in the signal model. For example, when searching for the Higgs boson one may 
carry out the analysis for each value of the mass mn- In this situation one can simply 
repeat the power-constraint procedure for each value of the signal model's parameters, as is 
illustrated in Fig. [H 
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Figure 4: Illustration of the 
power-constrained limit as a func- 
tion of a model parameter such 
as the Higgs boson mass mn (see 
text). 
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In Fig. m the solid line represents the median value of the unconstrained upper limit /j^^p, 
and the lower and upper dashed curves are the 0.16 and 0.84 quantiles of the distribution of 
/iup- The dotted curve in Fig. U] represents a possible outcome for the unconstrained limit /Zup. 
The minimum power is taken to be Mmin = = 0.16, and thus the power-constrained 

limit is the greater of the dotted and lower dashed curves, as indicated by the shaded curve. 

8 Treatment of nuisance parameters 

In many analyses, the probability model that describes the data is not uniquely specified 
by the parameter (or parameters) of interest, but rather also contains nuisance parameters. 
That is, the values of these parameters are not known a priori and they must be fitted using 
the data. For concreteness suppose the model is characterized by a strength parameter n and 
a set of nuisance parameters 6 = {6i, . . . ,0iy). 

The nuisance parameters complicate the present problem in two ways. First, they make 
it difficult to construct an unconstrained interval for the parameter of interest that has the 
correct coverage probability for all values of 9. This problem has been widely discussed in 
recent years, e.g., Ref. [6j. Many of the proposed procedures give intervals with correct cover- 
age for some values of 9, but approximate coverage elsewhere. For example, an approximate 
solution based on the profile likelihood ratio test is discussed in Refs. [7]. For the present 
discussion we will assume that a test procedure that gives an unconstrained interval has been 
chosen. Its coverage probability may or may not be exactly equal to the nominal confidence 
level for all values of 9. 

Of more direct concern for the present paper is the fact that the power of the test of ^ 
with respect to the no-signal alternative will depend in general on the nuisance parameters 
9. As the power is intended to represent the probability, under assumption of the no-signal 
model, to reject a given value of fi, we take the values of 9 that are in best agreement with 

the actual data under assumption of // = 0. We denote these as 9(0), i.e., they are the 
conditional estimators for 9 under assumption of /i = 0. 

As a consequence of this choice, the power Mo(/x) becomes a function of the actual data, 
since the data are used to determine values for the nuisance parameters. Thus the range of fx 
values where one has sufficient sensitivity also depends to some extent on the data. This may 
seem counter-intuitive, since the power of a specific test, i.e., at a given point in (/i, 0)-space, 
is independent of the data. But there is a certain power Mo(^) for every point in 0-space, 
and one uses the data to choose the point at which one quotes the power. 
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Alternatively, one could require that the power is greater than or equal to the minimum 
threshold for all values of the nuisance parameters in a specified range. In this way the set 
of fi values for which one has sufficient sensitivity would not depend on the data. As this 
would entail considerable computational effort, however, we prefer to define the power using 
a specific point in the nuisance-parameter space as described above. 

9 Summary and conclusions 

We propose a power-constraint procedure for modifying confidence limits so that parameter 
values to which one has little or no sensitivity are not excluded. The sensitivity is measured 
using the power of the test of the parameter with respect to the no-signal alternative. The 
coverage probability of the resulting limits is equal to the nominal confidence level (e.g., 95%) 
for parameter values to which one's sensitivity is above a given threshold, and 100% if the 
sensitivity is below the threshold. This can be contrasted with the CLg procedure, for which 
the coverage probability is always greater than the nominal confidence level by an amount 
that varies continuously as a function of the assumed parameter value. 

The power used for the sensitivity threshold is a matter of convention, but we recommend 
taking this to be Mmin = $(—1) ~ 0.16. This is consistent with allowing for reasonably small 
downward fluctuations of the data by drawing the boundary at the one-sigma level. Allowing 
more than 1.64(T fluctuations would mean the power constraint is never imposed (for a 95% 
confidence level limit), and requiring Mmin = 0.5 would impose the power constraint half of 
the time, including cases with only an infinitesimal downward fiuctuation. 

The PCL procedure is easily extended to problems with nuisance parameters. There we 
define the power with respect to the background-only {fj, = 0) model using the conditional 
estimates of the nuisance parameters given /u = 0. 

The PCL procedure is particularly useful in cases where spurious exclusion is problematic, 
such as when a one-sided test is inverted to give an upper limit. It can be applied, however, 
to any confidence interval, including those based on inversion of a likelihood-ratio test (i.e., 
Feldman- Cousins intervals [8j). 

When reporting results, we recommend to show both the constrained and unconstrained 
limits. In this way one can know whether a given parameter value is not rejected because the 
data are in good agreement with it, or rather because it is a value to which the sensitivity is 
deemed to low to allow exclusion. 
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